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Abstract 

Among the many ways to model signals, a recent approach that draws considerable attention is sparse rep- 
resentation modeling. In this model, the signal is assumed to be generated as a random linear combination 
of a few atoms from a pre-specifled dictionary. In this work we analyze two Bayesian denoising algorithms 
- the Maximum-Aposteriori Probability (MAP) and the Minimum-Mean-Squared-Error (MMSE) estimators, 
under the assumption that the dictionary is unitary. It is well known that both these estimators lead to a 
scalar shrinkage on the transformed coefficients, albeit with a different response curve. In this work we start by 
deriving closed-form expressions for these shrinkage curves and then analyze their performance. Upper bounds 
on the MAP and the MMSE estimation errors are derived. We tie these to the error obtained by a so-called 
oracle estimator, where the support is given, establishing a worst-case gain- factor between the MAP/MMSE 
estimation errors and the oracle's performance. These denoising algorithms are demonstrated on synthetic 
signals and on true data (images). 
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1. Introduction 



A classical and long-studied subject in signal processing is denoising. This task considers a given measure- 
ment signal y G R" obtained from a clear signal w £ E" by an additive contamination of the form y = w + v. 
We shall restrict our discussion to zero mean i.i.d. Gaussian noise vectors v G K", with each entry drawn at 
random from the normal distribution J\f (O, a^). The denoising goal is to recover w from y. 

An effective denoising algorithm assumes knowledge about the noise characteristics, like the above de- 
scription, and introduces some assumptions about the class of signals to which w belongs, that is, a-priori 
knowledge about the signal. There is a great number of algorithms today, corresponding to a variety of signal 
models. Among these, a recently emerging group of techniques relies on sparse and redundant representations 
for modeling the signals 

A signal w is said to have a sparse representation over a known dictionary, D £ R"^™, if there exists 
a sparse vector x e R™ such that w — Dx. The vector x is the representation of w, having a number of 
non-zeros, ||x||q = fc, which is much smaller than its length, m. Thus, x describes how to construct w as 
a linear combination of a few columns (also referred to as atoms) of D. In general, the dictionary may be 
redundant, containing more atoms than the signal dimension (m > n). 

Assuming that w = Dx with a sparse representation x, how can one recover w from the noisy measurement 
y? By posing a prior probability density function over x, one can derive the exact Maximum-A'posteriori 
Probability (MAP) estimator for this task. This becomes a search for the support of the sparse representation 
X that maximizes the posterior probability. This problem is computationally complex, as it generally requires 
an exponential sweep over all the possible sparse supports |17|. Therefore, approximation methods are often 
employed, such as the Orthogonal Matching Pursuit (0MP)[3| and the Basis Pursuit (BP) 

While MAP estimation promotes seeking a single sparse representation to explain the measurements, recent 
work has shown that better resultJl] are possible using the Minimum Mean Square Error (MMSE) estimator 



14l 120, These works develop MMSE estimators, showing that they lead to a weighted average of all the 



possible representations that may explain the signal, with weights related to their probabilities. Just like MAP 
in t he g eneral setting, this estimation is infeasible to compute, and thus various approximations are proposed 
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A well known and celebrated result in signal processing is the fact that the MAP estimator mentioned 
above admits a closed-form simple formula, in the special case where the dictionary D is square and unitary 
This formula, known as a shrinkage operation, yields the estimate w by applying a simple ID 
operation on the entries of the vector D-^y. The denoised signal is then obtained by multiplication by D. 
Shrinkage tends to eliminate small entries, while leaving larger ones almost intact. 

Our recent work reported in [isl . [l^ aimed to develop an MMSE closed-form formula for the unitary 
case. With a specific prior model on x, a recursive formula for this task was developed. Thus, at least in 
principle, the implications from this work are that one need not turn to approximations, as this formula is 



In the ^2-error sense, which is often the measure used to assess performance. 
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easily computable, leading to the exact MMSE. While such a result is very encouraging, it does not provide a 
truly simple technique of the form that MAP enjoys. Furthermore, due to its recursive nature, this algorithm 
suffers from instability problems that hinder its use for high-dimensional signals. 

In the present work, we propose a modified prior model for the sparse representation vector x. We show 
that this change leads to a simplified MMSE formula, which, just as for the MAP, becomes a scalar shrinkage, 
albeit with a different response curve. As such, this exact MMSE denoising exhibits no numerical sensitivities 
as m , and thus it can operate easily in any dimension. 

The core idea that MMSE estimation for the unitary case leads to a shrinkage algorithm has been observed 
before Here we adopt a distinct approach in the derivation, which also gives us exact and 

simple expressions for MAP and MMSE shrinkage curves, and their expected ^2-errors. We use these as a 
stepping-stone towards the development of upper bounds on the MAP and the MMSE estimation errors. 

A fundamental and key question that has attracted attention in recent years is the proximity between 
practical pursuij^ results and the oracle performance. The oracle is an estimator that knows the true support, 
thus giving an ultimate result which can be used as a gold-standard for assessing practical pursuit performance. 
For example, the work reported in Q shows that the Danzig Selector algorithm is a constant (and log) factor 
away from the oracle result. Similar claims for the BP, the OMP, and even the thresholding algorithms, are 
made in 

In both these papers, the analysis is deterministic and non-Bayesian, which is different from the point of 
view taken in this paper. In this work we tie the MAP and the MMSE errors for the unitary case to the 
error obtained by an oracle estimator. We establish worst-case gain-factors of the MAP and the MMSE errors 
relative to the oracle error. This gives a clear ranking of these algorithms, and states clearly their nearness to 
the ideal performance. 

The paper is organized as follows. In section [5] we describe the signal model we shall use throughout this 
work. For completeness of the presentation, we also derive the MAP and MMSE estimators for the general 
case in this section. In Section|3]we turn to the unitary case and present the ideal MAP and MMSE estimators, 
showing how both lead to shrinkage operations. Section |4] is devoted to the development of the performance 
behavior of the MAP and MMSE estimates, and the upper bounds on their errors. Section[5]presents numerical 
experiments, demonstrating the proposed algorithms in action. In Section [6] we conclude the paper. 

2. Background 



2.1. The Signal Model 



In this model, each atom 



We consider a generative signal model that resembles the one presented in 
has a prior probability Pi of participating in the support of each signal, and (1 — Pi) of not appearing. One 
can think of the support selection stage as performing biased coin-tosses of m coins, with the i*'' coin having 



^Pursuit is a generic name given to algorithms that aim to estimate x. 
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a probability Pi of "heads" and (1 — Pi) for "tails". The coins that turn up ("heads") constitute the support 
S for this signal. Thus, the a priori probability for any support S is given by 

P{S) = \{P.-X{{l-P,). (1) 

It is important to note that, as opposed to the model used in Q, here it is not possible to explicitly 

prescribe the cardinality of the support, nor is it possible to limit it (as even the empty and full supports may 
arise by chance). If, for some i, Pi equals 0, all the supports that contain element i have zero probability. 
Similarly, if we have Pi — 1, then all the supports that do not select the i*'* atom also have zero probability. 
Hence, in our study we only need to consider values < P,; < 1 for all i, and this is assumed henceforth. 

We further assume that, given the support S, the coefficients in x on this support are drawn as i.i.d. 
Gaussian random variablef|j with zero mean and variance cr^ , 

x|5^AA(0,a2l|5|), (2) 

where I|5| is the identity matrix of size \S\. 

We measure the vector y e M", a noisy linear combination of atoms from D with coefficients x € R™, 
namely, y = Dx + v, where the noise v is assumed to be white Gaussian with variance cr^, i.e., v ^ Af (O, cP'In) 
, and the columns of D are normalized. 



^From the model assumptions made above, it can be seen [I3 that y and x are jointly Gaussians for a 
given support. 



y 

X 








C5 a^D^ 



(3) 



where 



Cs (tIJ^s^I + o'^ln, (4) 

and D5 G R"><l'5| is comprised of the columns of the matrix D that appear in the support S. Hence, the 
marginal p.d.f. P{y\S) is Gaussian and it is given by 

y\S^N{0,Cs). (5) 

Using properties of the Multivariate Gaussian p.d.f. (see P- 325]), we have that the likelihood P(y|x, 5) 
and the posterior p.d.f. P(x|y,5) are also Gaussian, namely 



y|x,5 
x|y,5 



1 
72 



(6) 
(7) 



^In fact, we may suggest a broader model of the form P{xi) ~ exp{— /(xi/a-^,)}, for an arbitrary function /(■), thus keeping 
the model very general. It appears that with this change one can still obtain MMSE-shrinkagc. Furthermore, one may also study 
the sensitivity of MMSE/MAP shrinkage-curves under perturbations of /(■), and even find the worst choice of this function, that 
leads to the maximal expected error in MMSE - all these are left to future work, as we mainly focus here on the Gaussian model. 
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where the sub- vector X5 is comprised of the elements of x whose indices are in the support S, and 
There is a direct hnk between the matrices Q5 and C5 , expressed using the matrix inversion lemma. 



= ^^n-^T^sdt^l. (9) 

2.2. MAP/MMSE Estimators - The General Case 

2.2.1. The Oracle Estimator 

The first estimator we derive is the oracle. This estimator assumes knowledge of the chosen support for x, 
information that is unknown in the actual problem. Therefore it cannot be obtained in practice. Nevertheless, 
it gives us a reference performance quality to compare against. The oracle can target the minimization of the 
MSE0. A well-known and classical result states that the MMSE estimator is equal to the conditional mean 
of the unknown, conditioned on the known parts, and thus in our case it is E{x.\y,S}. As the support S is 
known, we need to estimate X5 , the sub- vector of non-zero entries of x, so the estimator is given by 

^Oracle ^E{^s\y,S} = ^Q^'O^y, (10) 

where this equality comes from the expectation of the probability distribution in ([7]) . 

2.2.2. Maximum A-Posteriori Estimator (MAP) 

The MAP estimator proposes an estimate x that maximizes the posterior probability. As the model mixes 
discrete probabilities Pi with continuous ones P(x|5), the MAP should be carefully formulated, otherwise, 
the most probable estimate would be the zero vector. Thus, we choose instead to maximize the posterior of 
the support, 

S^'^P = argmaxF(5|y), (11) 
s 

and only then compute the corresponding estimate ^^mav . We know from Equation ([T]), that x|y,5 behaves 
as a normal distribution, and thus the estimate 'x.mA'P is given by the oracle in (jlOp with the specific support 
gMAP ^ Using Bayes's rule. Equation pT|) leads to 

pisw) - ™a. ,12, 

Since P{y) does not depend on S, it affects this expression only as a normalizing factor. Using the expressions 
of the probabilities in the numerator that are given by Equations ([5]) and ([T]), respectively, we obtain 



*Or MAP - in fact, the two are the same in this case due to the Gaussianity of x|y,<S. 
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where we have introduced the notation tg for brevity of later expressions. Returning to our MAP goal posed 
in Equation pT|) . applying a few simple algebraic steps on the expression for P{S\y) leads to the following 
penalty function, which should be maximized with respect to the support S, 

1 2 



Val{S) 



(14) 



i log det Cs + J2 log + (1 - ^^■) ' 

2 ies ji^s 

over all 2™ possible supports. Once found, we obtain the MAP estimation by using the oracle formula from 

Equation (jlO[) . which computes X5 for this support. 

2.2.3. Minimum Mean Square Error Estimator (MMSE) 

The MMSE estimate is given by the conditional expectation, £'{x|y}, 

X*"'^^=i?{x|y}= / xP(x|y)dx. 



(15) 



Marginalizing the posterior probability P(x|y) over all possible supports 5 € 57, we have 

P(x|y) = 5]P(x|y,5)P(5|y). 

sen 

Plugging Equation ([TC]) into Equation (ITSl) yields 

^MMSE ^ Y.nS\y) I xP(x|y,5)dx 
sen •''^ 

= ^P(5|y)£;{x|y,5} 



(16) 



sen 



^ P(5|y)xg'■-'^ 



(17) 



5eo 



Equation ((T7)) shows that the MMSE estimator is a weighted average of all the "oracle" solutions, each with 
a different support and weighted by its probability. Finally, we substitute the expression tg developed in 
Equation into Equation PT)) . and get the formula for MMSE estimation, 

'AIMSE 1 , -Oracle 



(18) 



sen 



where t = X^seo overall normalizing factor. 



2.3. Estimator Performance ~ The General Case 

We conclude this background section by discussing the expected Mean-Squared-Error (MSE) induced by 
each of the estimators developed above. Our goal is to obtain clear expressions for these errors, which will 
later serve when we develop similar and simpler expressions for the unitary case. 

We start with the performance of the oracle estimator, as the oracle is central to the derivation of MAP 
and MMSE errors. The oracle's expected MSE is given by 

_i 1 



E 



Oracle ,^ 1 1 
S 2 



E- 



= E- 





2 








'1 




2 





Dj(D5X5+v)-X5 



y \ = trace (Q^^) 



(19) 
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where we have used Equation (|H]), and the fact that y = D5X5 + v. 

Our analysis continues with the expected error for a general estimate ic, observing that it can be written as 

i?|||x-x||^ y| = / ||x-x||^P(x|y)dx 

^P(5|y)/ ||x-x||^P(x|y,5)(ix, 
0^0 JxeR" 



(20) 



where we have used the marginalization proposed in Equation (jl6p . We add and subtract the oracle estimate 



■^Oracle 



that corresponds to the support S into the norm term, yielding 

/ ||x-x||^P(x|y,5)dx = / ||xg'--'^-x||'p(x|y,5)dx 



(21) 



\.Oracle I 



:P(x|y,5)dx. 



Note that the integral over the cross-term (x| 



^Oracle 



xGl 



-^Oracie^ vanishcs, since the term (x — xOi-acie~j 
is deterministic and can thus be moved outside the integration, while the expression remaining inside the 
integral is zero, since the oracle estimate is the expected x over this domain and with this support. 

Continuing with Equation (|2ip . the first term represents the MSE of an oracle for a given support 5, as 
derived in Equation In the second term, the norm factor does not depend on the integral variable x, and 
thus it may be pulled outside the integration. The remaining part is equal to one. Therefore, 



/ ||x - XII2 P(x|y, 5)dx = trace (Q^^) + ||x - x 



Oracle I 



(22) 



Returning to the overall expected MSE as in Equation (PO]) . using the fact that P{S\y) = tg/t, as developed 
in Equation (|13p. we have 



e{\\^-x\\1} = jJ2^s- [trace (Q^^) + ||x - x 



-^Oracle 1 1 ^ 



(23) 



sen 



By plugging x = ^^'^^'^se -^^^^ -j^j^jg expression, we get the MMSE error. Note that if we minimize the above 
with respect to x, we get the MMSE estimate formula exactly, as expected, since the MMSE is the solution 
that leads to the smallest error. 

Observe that (|23p can be written differently by adding and subtracting x'^^^^se [Tn^g[^Q norm term, 
giving 



ii; { ||x - xii^l } = ^ E ^5 ■ t^ace + 7 E *5 II* - + - *5™''1 



sen 



sen 



trace (Q^^) + ||x - x^"^^^ || J + | E II* 



MMSE --Oracle I 



Sen 



-MMSE I 



E 



{II* 



-MMSE 



'}■ 



sen 



(24) 



In this derivation, the cross-term (x — j^MMSEy ^^mmse _ ^Oracie^ drops out, since in this summation the 
term (x — j^^MSEy positioned outside the summation, and then, using Equation (fT5|) . it is easily 
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shown that we are left with an expression that equals x^t'^iSE _ ^mmse _ q have then a general error 
formula for any estimator, given by equation p4p. In particular, this means that the error for the MAP 
estimate can be calculated by 



{llx^^^^-xll^ly} = ||x*^^^-x^"'^^^||'+i?{||x^"^^^-x||2|y}. (25) 



3. MAP & MMSE Estimators for a Unitary Dictionary 

The derivation of MAP and MMSE for a general dictionary leads to prohibitive computational tasks. As 
we shall see next, when using unitary dictionaries, we are able to avoid these demanding computations, and 
instead obtain closed-form solutions for each one of the estimators. Furthermore, the two resulting algorithms 
are very similar, both having a shrinkage structure. 

While this claim about MAP and MMSE leading to shrinkage is not new , our distinct devel- 

opment of the closed-form shrinkage formulae will lead to a simple computational process for the evaluation 
of the MAP and the MMSE, which will facilitate the performance analysis derived in Section |4l 

3.1. The Oracle 

Just as for the general dictionary case, we start by deriving an expression for the oracle estimation. In this 
case, we assume that the dictionary D is a unitary matrix, and thus D^D = I. Moreover, it is easily seen 

>?D5=I|5|, 



that D^D^ — which will simplify our expressions. We start by simplifying the matrix Q5 defined in ([8|), 



Q^-^%, + >?D.^^I„, (26) 



The oracle solution, as given in Equation ()10|) . becomes 



1 



XO.ade^_Q^lj3Ty^^2^^^ (27) 

where we have defined the constant = cr^/(o'^ + cr^) and the vector (3^ = D^y. The oracle estimator has 
thus been reduced to a simple matrix by vector multiplication. 

3.2. The MAP - Unitary Case 

We turn to the MAP estimation, which requires to first find the optimal support S based on Equations 
and (fH)) . and then plug it into the oracle expression as given in Equation pUj) to get the estimate. 

We proceed by simplifying the expression det(C5) in Equations (fT5|) and (HH). The matrix C5 is defined 
in Equation (j4]) as C5 ~ ct^D^D^ -t- cr^I„. Denoting by W5 a diagonal matrix with ones and zeros on its 
main diagonal matching the supporj^l 5, we obtain 

det(C5) = det (ct^D^D;^ + a^i^) 

= det (D) • det (ct^W^ + (t\,) ■ det (D^) 

= (^T^-f f72)l^l(a2)"-l^l = (l-c2)-l''la2". (28) 



^(W^)ii is 1 if j £ iS, and elsewhere. 
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Plugging this result into Equation (|13p . and using the relation between C5 and Q5 in Equation ©, yields 
P(5|y) « exp|^y^D5Djy-ilog(l-c2)-|^l|n^'.-ni-^^ 

cx exp|^||/35||^|n^*VT^-ni-^:' 



« n^^pj^^^'l^'^i^-ni-^:'- (29) 

ies ^ 



Taking into account that < -P; < 1, we can rewrite this expression as 



P{S\y) « nexp|^/3n^v/r3^.[]l-P,. 



P, 



ies ^ ' j=i 



n -p { £2^^ \ T%. - n (30) 



ies ^ ^ ies 



where we have defined 



9. = cxp 1^/3,^1 



(31) 



We further define gi — qi/{l + Qi) (which implies that qi — gi / — gi)) , and substitute this into Equation ([30 
Adding now the necessary normalization factor we get 

Xs'enieS' I ies 

9i \ TT 9i 



Ks'enies* ies 
T,s*en I\^es* 9i Uj^s- (1 - 9j) \ U,es 9i Ilj^s (1 " 9j 



nLi(l-5fc) J nLi(l-5fe) 

Y.\{9^\{{l-9J)\ \{9.\{{l~9,)- (32) 

^S'emeS' j<^s' / ies jt^s 

The following observation will facilitate a further simplification of this expression: 

Proposition 1. Let be the set of all possible subsets of n indices, and let g^ be values associated with each 
index, such that < < 1. Then, 

Y.X{a^-X{{i'9,) = i- (33) 

Semes j(^s 

Proof. Consider the following experiment: a set of n independent coins are tossed, with the i*'' coin having 
a probability gi for "heads" and (1 — gi) for "tails". The probability of a specific set of S coins turning up 
"heads" (and the rest turning up "tails" ) is Yiies 9i ' Ilj^s (1 ~ .9j)- For any one toss of the n coins, exactly one 
of these combinations will be the outcome. Therefore, the sum of these probabilities over all the combinations 
must be 1. □ 
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Using tliis proposition, the normalization term in Equation p2p vanishes, as it is equal to 1 (0 < < 1 

l+qi 



since Qi = and > for every i). We therefore obtain 



P{S\Y) = \{cn\{{l-g,). (34) 



The optimization task (jlip can now be written as 



S-^^-^ = argmax U g.^U (1 - g,) 

ies jfs 



arg max TT — TT ( 1 ^ T-^ — 



= nLi(i + gfe) ^"''"'^ l^-p|^ft|T37^vi-c2. (35) 

Interpreting this expression, we see that every element in the support influences the penalty in one of two 
ways: 

• If it is part of the support: Multiply the expression by \/l — ^^^p exp or 

• If it is not in the support: Multiply the expression by 1. 



As we aim to maximize the expression in Equation ([35]), the support will contain all the elements i such that 
y/1 — ■ ■ exp I > 1- (In the case that no such element exists, the support should be empty and 

the solution is therefore x^'^^p — 0.) Once these elements are found, all we have to do is to multiply their 
value /3i by and this is the MAP estimate. 

Stated differently, this means that after computing the transformed vector /3 = D-^'y, we test each of its 
entries, and set the MAP estimate for the i*'' entry to be 



. ^2 

:.MAP 



IAI>4^Jiog(^) 



^f^^ = ^MAp(ft) = <( 'V °vvi-c^^.y . (36) 

otherwise 

This is the shrinkage algorithm mentioned earlier - each entry is handled independently of the others, passing 
through a scalar shrinkage curve that nulls small entries and keeps large ones intact (up to the multiplication 
by c^). There is no trace of the exhaustive and combinatorial search that characterizes MAP in the general 
case, and this simple algorithm yields the exact MAP estimation. 

3.3. The MMSE - The Unitary Case 

Equation ([T51) shows the presence of the oracle in the MMSE estimation. Similarly to MAP, we make use 
of the unitary oracle estimate in Equation ((27|) . Note that (3^ may be written as 



/35-^l5(fc)/3fcefc, (37) 
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where is the fc*'* vector in the canonical basis, and ls{k) is an indicator function {ls{k) = 1 if fc e 5, and 
zero otherwise). While this may seem like a cumbersome change, it will prove valuable in later derivations. 
Starting from Equation p^ . substituting the expression developed for P{S\y) in Equation ([M)) into Equation 
([T51) . and using Equation (1571) . we obtain the following expression for the unitary MMSE estimator. 



^MMSE 



E 

sen 



ies ji^s 



\k=l 



fc=i 



f3kek 



(38) 



We introduce now another observation, similar to the one posed in Proposition [TJ This will be used to further 
simplify the above expression. 

Proposition 2. Let Q be the set of all possible subsets of n indices, and let gi be values associated with each 
index, such that < < 1. Then, 



sen ies j(^s 



(39) 



Proof. In the spirit of the coin tossing interpretation described in the proof of Proposition [1] the multiplica- 
tion by the expression Isik) implies that only toss outcomes where the fc*'' coin turns up "heads" are included 
in the summation. Thus, the overall probability of those is exactly the probability that the fc*'' coin turn up 
"heads" , which is gk as claimed. A somewhat more formal way to pose this rationale is by observing that 

J2w)U9^-I[i^-9,) = E 113^-11(1-5.) 

Sen ies j^s sen s.t. kes ies j^s 

= -^fc' E llf*- ll(i~-9j)- 

senk ies j^s 

The last summation is over the set ilk, that contains all the supports in il and do not contain the fc*'* entry. 
Thus, for the remaining n — 1 elements, this summation is complete, just as posed in Proposition [U and 



therefore the overall expression equals gk- 

Returning to the MMSE expression in Equation 
expression of the form 



□ 



and using this equality, we get a far simpler MMSE 



^MMSE _ ^2 



'^dkP, 



k^k 



k=l 



'St 

fc=i 



Qk 



Qk 



-Pk^k- 



(40) 



This is an explicit formula for MMSE estimation. The estimation is computed by first calculating /3 = D"^y, 
and then simply multiplying each entry j3k hy qk / {1 + qk) (which is a function of (3^ as well). Explicitly, the 
MMSE estimate is given elementwise by 

exp{^/32}^VT3^ 



'MMSE 



— ^MMSE^i) — 



l + exp{^/32}^yr- 



(41) 
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-4- MMSE 

' MAP 

"-5 -4 -3 -2 -1 1 2 3 4 5 
P 



(c) a = 1 

Figure 1: Shrinkage functions for MMSE and MAP estimators {Pi = 0.1, = 1). 

This operation has the form of a scalar shrinkage operation, just hke MAP. For \ f3i \ ^ cr/c this formula leads to 
^MMSE ^ whereas for |/3i| > cr/c the outcome is xf^'^^E ^ ^2^. Q^g^ j-j^g MAP). Thus, the expression 
multiplying c^/3i here serves as a soft-shrinkagj^ operation, which replaces the hard-shrinkage practiced in the 
MAP. Figure [T] shows the various shrinkage functions obtained for each estimator. 

4. Performance Analysis 

4-.1. Deriving the Estimators' MSB 

Our main goal in this work is to develop error expressions for the different estimators in the unitary regime, 
exploiting the general derivations of section l2.3l We start by calculating the error for an oracle solution xOracie^ 
Using Equation (f26| we obtain 

n 

E^W^Oracle _ j ^ ^^^^^ ^q^I^ ^ |^|^2^2 ^ ^ I^(k)c'a^ (42) 



®This should not be confused with the term soft-thresholding obtained when minimizing an £i penalty. 
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where the indicator function is the same as previously used in p7[). The last equality will become useful for 
our later development. 

Turning to the MMSE estimator, recall the general expected-MSE expression in Equation (P^ . 



sen 



ISlc'a 



,2„2 I W'MMSE ~Oracle\ 



(43) 



Using the unitary MMSE estimator expression in Equation (PU)) and that of the oracle solution in Equation 
(|27p . we further develop the second term in the expression above, and obtain 

2 



l-MMSE ~ Oracle \ 



k=l k=l 

= J2c'[9k-Isikfl3l 

n 

= Y.'''[9l-'^9kls{k)^ls{k)\pl. 



(44) 



fc=i 



Plugging this expression back into Equation (|33|), together with the expression for P{S\y) in Equation p4p . 
gives 



MSE {^MMSE^ 



sen 



k=l 



fc=i 



5en ie5 j(fs 



k=l 



E n 5« n (1 - s,) [gl - ^gklsik) + Isik)] 

Sen ies j^s 

n 

Y^c^a^gu + c^Pl (gk-gl). 
fe=i 



(45) 



Here we have exploited Proposition (5] Interestingly, the property \S\ — J2k=i'^s{k) and Proposition [5] yield 
the relationship 

E{\s\} = J2Pi^\y)\^\ 

sen 



= E Ei-w n^^n(i-5.) 

sen \k=i I ies j^s 



E 

fe=i 



Sen ies j^s 



E^fc- 

fe=i 



(46) 



This implies that the MMSE error can be alternatively written as 



MSE fx^^*^^^^' 



(47) 



fc=i 
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suggesting that the error is composed of an "oracle" erroiQ, and an additional part that is necessarily positive 
(since < gk < 1)- As an extreme example, if the elements of the vector /3 tend to be either very high or 
very low (compared to cr/c), then the gk tend to the extremes as well. In such a case, the second term nearly 
vanishes, and the performance is close to that of the oracle. 

We next study the MAP performance. Recall Equation ((23)) . and note that x*^^^ may be written as 



K^'^P = Y,iMAv{kyi3kek, (48) 



fc=i 



where lMAv{k) is an indicator function for the MAP support. Exploiting Propositions [T] and [2l we obtain the 
following expression for the MAP mean-squared-error. 



MSE (x*^^^) = E P(^\y) \ ^^^^ E + E '^MAv{k) - 2\s{k)\MAv{k) + \s{k)\ /?, 



5ef2 k fe=i fc=i 

= eI^ve 1^(^)11^/^11(1-5.) 

fe=i [ 5eo ies j^s 

+c^Pl E n (1-5,) [lMAv{k)+ls{k) - 2ls{k)lMAv{k)] 

Sen ieS j^S 

n 

= E^'^'5fc + c*/3fe[5fe + Wp(fc)(l-2gfe)]. (49) 
fc=i 

Analyzing the difference between the MMSE and MAP errors, in Equations (|l5)) and respectively, we 
find that only the last terms in each are different: — versus I^^-p(fc)(l — 2gk), respectively. Obviously, this 
implies MSE (x^v/msb^ < ^gg (^^a/ap^^ because -gl < lMAv{k){l - 2gk) for any k, and regardless of the 
value of T-MAvik) (zero or one). 

In order to further understand the estimators' performance given in the Equations (j45p and (|49p. we turn 
now to a further analysis of these expressions and derive worst-case upper-bounds for them. The bounds 
we are about to build do not depend on the dimension of the signal, but rather on the problem parameters 
{a, ax, Pi) alone. We begin with the MMSE, then turn to the MAP, and finally compare and discuss the 
resulting bounds. 

4-. 2. MMSE Performance Bound 

Referring to Equation P5)) . which describes the error associated with the MMSE approximation, we shall 
denote by MSEi the first term, 

n 

MSEi=c''<j^Y.9k- (50) 

As mentioned before, this is the expected MSE of the oracle (given y). The second term, denoted by MSE2, 
is given by 

n 

MSE2^c^J2f^k9k{l-9k). (51) 

k=l 



^See the similarity between the first term here and the one posed in Equation ((42}. 
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This is the additional error due to the fact that the support is unknown. We would like to bound the ratio 
r = MSE2/MSE1, as this immediately yields a bound (r + 1) on the MMSE error in terms of the expected 
oracle error. Our goal is thus to characterize the worst ratio 

MSE2 

max r = max — — -— , 52 

yGK" yeR^ MSEi' ^ ' 

that is, the worst (largest) ratio over all conceivable signals y, where the dependence on y enters via the /3fc's. 
In order to characterize this ratio, we shall need the following simple lemma: 

Lemma 3. Let (a^, bk) , fc = 1, . . . , n, he pairs 0/ positive real numbers. Let m be the index of a pair whose 
ratio is maximal, i.e., 

^ < ^ for all k > 1. (53) 

Ok Orn 



Th 



en 



En 
k=i «fc < 



bj bm ' 

with equality occurring only */ = = ' ' ' = • 

Proof. By ()53|) . Okhm < OLmbk for all fc > 1, with equality obtained only if Ok/bk = am/bm. Summing up all 
these inequalities, we obtain 

n n 

b,n ^ flfe < a„i ^ bj , 
k=l J=l 

hence, 

En 
k=l ^ dm 



as claimed, with equality occurring only if p- = for every i. □ 



bj b„i 



Returning to our task of bounding MSE2/MSE1, we observe that this ratio can be written as 

MSE2 ^ c^El=iPl9kil~9k) 

which is of the same form as the ratio appearing in the Lemma. This leads us to the following Theorem 



(54) 



Theorem 4. Denote Gk = VT— (?-Pfe/(l — Pk), and let m be the index corresponding to an a priori least 
likely atom, i.e., P„i = mini<fe<„ Pfe and hence, Gm = mini<fc<„G'fc. Denote fMMSE{s) = i+a^^e" ' "'^'^ 
define (implicitly) s* = argmaxs>o /a/a/sb(s)- Then 

1. r* = fMMSE{s*) is an upper-bound on the ratio MSE2/MSE1. 

2. The worst ratio, r* , satisfies the explicit bound 

,.<^ 21n(^) G„<^. 0.034^ ^^^^ 
-7=- G„ > A « 0.034 
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Proof. Starting with the first claim, we embark from Equation (j54p and exploit Lemma [3] to obtain 

MSE^ ^ c^ELi/3fcgfc(i-gfc) 

MSE^ c^(J^Y:Li9k 

< — ■ max — 

a i<k<n gi; 

< 4- max (56) 

(T l<k<n 

Recalling that g^. = qk/{^ + the definition of in pip , and the definition of Gk above, we have 

1 1 



1 - fffc = 

i + '/Zc i + ^vi-c-exp^2^/3^| 

(57) 



1 + Gk exp{c2/32/2CT2}- 
Plugging this into Equation (|56)) and denoting s — c^f3'^/2a'^, we obtain 

MSB, ^ 2s 

MSEi ~ i<k<n 1 + Gk exp{s} ^ ' 

This is a monotonically decreasing function of Gk for any fixed value of s > (note that s must be non- 
negative, due to its definition). Thus, the maximum over the indices 1 < fc < n is obtained for the index m 
for which Gk is the smallest. Therefore, 

AfCF - TT = Jmmse[s ) = r , 59 

/3 Mbhi s>o 1 + Gmexp{s| 

as claimed. 

Turning to the second claim of the theorem, we desire to bound /a/a/S£;(s) from above. To this end, we 
maximize the alternative function /(s) that bounds /mmsb(s) from above point- wise: 

Immse[s) = — - < , '^^ — ^ EE 7(s). (60) 

l + GmS.'^ max (1, zvLrme'^j 

Here we have used the facts that (i) the arithmetic mean (l + Gme^)/2 is necessarily larger than the geometric 



one, s/GraG'^ , and (n) 1 + Gm^^ > 1. 

The switch-over in the denominator of /(s) occurs when Gme'^ = 1/4, which takes place for s = Sq 



ln(l/4G'm). For s < sq, /(s) = 2s, which is monotonically increasing. For s > sq, /(s) = sj \fG^^ , whose 
derivative is given by /'(s) = (1 — s/2)/^/G^^ . Thus, if so > 2, the maximum of /(s) occurs at s = sq, being 
/(sq) = 2so = 21n(l/4G'„i). Otherwise, the maximum occurs at s = 2, being /(2) = 2/\/G^e. This proves 
the explicit upper bound on r* ^ as given in Equation (j55p . □ 

Figure [2] shows the functions fiviMSEis) and its upper bound /(s) for two possible values of Gm- 0.01 
and 0.1. These two cases correspond to the two options covered in Equation (|55|) . As can be seen, for 
Gm — 0.01 < 0.034, the maximum point is obtained on the linear part of /(s), whereas in the case of 
Gm = 0.1 > 0.034, the maximum is obtained for s = 2. Figure [3] presents the value of r* as a function of Gm- 
This figure also shows the upper-bound on this value as given in Equation (j55p . and the two sub- functions 
that comprise it. 
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hiMSEis) for G,„ = 0.01 

- -f(s) for G„ = 0.01 

Maximum location 

[mmse{s) for G,„ = 0.1 

f(s) forG,„=0.1 

Maximum location 




Figure 2: Graph plot of the function fMMSE{s) and its upper-bounding function /(s) (the soHd and the 
dashed hues, respectively), exhibiting the two cases, where the maximum changes given the value Gm- 
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Corollary 5. The expected error for the MMSE estimator is bounded for any signal y by 



MSE (x^^A/SS^ < f^^Oracle^ 



G„ > \e 



Proof. Follows from Theorem HI 



(61) 
□ 



What happens when all the probabilities Pk are equal? In such a case we obtain that Gi — G2 ~ ■ ■ ■ = Gn- 
From Equation (|56|1. which uses LemmalU it is obvious that the worst-ratio r* becomes a tight upper-bound on 
MSE2/MSE1, since all the terms in the numerator and the denominator summations are equal. Furthermore, 
the worst-case /3fc's are all equal to ±(T-\/s*/c. 

4-.3. MAP Performance Bound 

We next develop an upper-bound on the error associated with the MAP estimate in Equation (j49| . While 
MS El remains the same as in Equation ([50| . the term that corresponds to MSE2 for the MAP becomes 

iMAv{k){l - 2gk) 



MSE2 = ^ 



k9k 



fc=i 



1 



9k 



Continuing with the same definitions as in the previous section, we prove a similar theorem for the expected 
MSE of the MAP estimator. 



Theorem 6. Denote Gk = Vl — c?Pk/{l — Pk), and let m be the index corresponding to an a priori least 
likely atom, i.e., Pm = inini<fe<„ Pk and hence, Gm — niini<fe<„ Gk. Define the function 



fMAp{s) 



2s 

2s 



GmC" < 1 
Grae' > 1 



(62) 



and define (implicitly) s* = argmaXs>o /map (s)- Then 

1. r* = fMAp{s*) is an upper-bound on the ratio MSE2/MSE1. 

2. The worst ratio, r* , satisfies the explicit bound 



2 In ^ Gm < e 



2 

G„e 



1 « 0.368 
-1 ~ 0.368 



(63) 



Gm > e 

Proof. The proof follows the same lines as that of Theorem S) Starting with the ratio r, we exploit Lemma 
[3] and obtain 

c'ELll^bkll + lMAvikV-^ 



MSE2 

MSEi 



< 



9 9 

T,k=i 9k 



1 + Wp(fc)i-|2i 



(J l<k<n 

r2 



9k 



< ■ max Bi. 

l<k<n 



1 + lMAv{k) 



l-2gk 



9k 



(64) 
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Again using the relation gk — Qk/i^ + Qk) and the definition of qk from (pij) . we have that 
l-25fc 1 . 1 . 1 



gk 



1 

Qk 



1 



Gk exp 



1 



Gk exp{s} 



1, 



(65) 



where we have used the definition of s as before (s = c^/3^/2(T^). Plugged back into Equation (|64l) . we obtain 



MSE2 

< max 2s 

MSEi ~ l<k<n 



1 



^MAV 



(fc) 



1 



1 



(66) 



Gk cxp{s} 

For any fixed value of s, the maximum over the indices 1 < fc < n is obtained for the index m for which Gk is 
the smallest. Therefore, maximizing this expression with respect to both k and s yields 



MSE-2 



max ■ 



/3 MSEi 



< max 2s 

s>0 



< max 

s>0 



'■MAV 



(to) 



1 



Gm exp{s} 
2s Gm exp{s} < 1 
Gm exp{s} > 1 



1 



(67) 



Here we have used the fact that I^^-p(to) = 1 when the atom to is part of the MAP support, which takes 
place if <Zm > 1 (see the discussion after Equation (|35I) V 

We turn to the second claim of the theorem, and calculate explicitly the value s* for which fMAp{s*) = r* 
is maximized. The switch-over between the two cases of /a/ap(s) occurs when Gm^'^ = 1, that is, s = so = 
ln(l/Gm). For s < sq, fMAp{s) = 2s, which is monotonically increasing. For s > sq, fMAp{s) ~ 2s/ {GmC^), 
whose derivative is given by /'(s) = (2 — 2s)/(G„ie*). Thus, if sq > 1, the maximum of / occurs at s* = sq, 
that is, /AfAp(so) = 21n(l/Gm). Otherwise, the maximum occurs at s* = 1 with /ma_p(1) — 2/(Gme). This 
proves the explicit upper bound r* as given in Equation □ 

Figure |4] shows two examples of fMAp{s) for two possible values of Gm- 0.2 and 0.8. These two cases 
correspond to the two options covered in Equation (|63|) . As can be seen, for Gm — 0.2 < 0.368, the maximum 
point of Jmap is obtained at the switch-over point, whereas in the case of Gm — 0.8 > 0.368, the maximum 
is found at s = 1. 

Figure [5] presents the value of r* as a function of Gm for both the MAP and the MMSE. This figure also 
shows the two sub- functions that construct r* for the MAP, as described in Equation ([63|). 



Corollary 7. The expected MSE error for the MAP estimator is bounded for any signal y by 
MSE(x^^^^) <MSE(xO'--''^) > ^ + Gm<e-^ 



^+Gt^ Gm>e- 



Proof. Follows from Theorem[6l 



□ 



When all the probabilities P^ are equivalent, and hence Gi = • • • = G„, we get again that the worst ratio 
r* becomes a tight upper bound on AISE2/MSE1, following the same reasoning as explained in the MMSE 
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s 



Figure 4: A plot of the function ,fMAp{s), exhibiting the two cases, where the maximum changes character 
according to the value Gm- 
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The MAP r* 




. 2Zrj(l/G™) 












The MMSE r* 











10 ' 10"^ 10"^ 10° io' 



Figure 5: The worst ratio r* for MAP, as given in Equation (|63|. This graph also shows the two portions of 
this function, the location of the switch between them, and the MMSE ratio r* . 
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case. The worst-case /3fc's are all given by 



-1- „2 C- ^ 



4.4. MMSE and MAP Bounds - A Summary 

The bounds developed above suggest that both the MMSE and the MAP estimators lead in the unitary 
case to a mean-squared error that is at worst a constant times the oracle MSE. The analysis given above 
provides exact expressions for these ratios. 

We should note that the bounds developed above are based on a worst-case scenario. A more practical goal 
would be to bound the average case, as this should tell us more about the behavior of real-life signals. We 
leave this topic to future work. 

As a last point in this section, we consider the following question: When are the MAP and MMSE nearly 
equivalent? Recall that the errors of these two estimators are given in Equations (|Tf)) and P5)) as 



k=l k^l 

n 71 

MSE(x^^-^^) = ^cV2g,-|-c4^/32[5fc+Wp(fc)(l-2fffc)]- 

fc=l k=l 

In order for these two errors to be close, we should therefore impose for all k 

9k- gl- 9k + lMAv{k){l - 2gk) gl ~ 2lMAv{k)gk + lMAv{k) « 0. (69) 



If Pk 0, this leads to gk 0, since gk (7fc/(l + Qk) and qk = %/l - c^Pk/il - Pk) ■ expc2/3^/2CT^ 
From Equation we also have that lMAv{k) = 0, implying that this index is not part of the MAP support. 
Returning to the requirement posed in Equation we obtain the condition « 0, which is readily satisfied. 
Thus, we conclude that one case where the two estimators, MAP and MMSE, align, is when Pk — > 0. 

When Pfe ^ 1, this leads to gk —t' 1. Relying again on Equation (|36l) we also have that iMAv(k) = 1 
this time, implying that this index is now part of the MAP support. Returning to the requirement posed in 
Equation (IMj) . we obtain the condition — 2gk + 1 = {gk — 1)^ ~ 0, again satisfied (since gk is close to 1. 
Thus, another case where the two estimators align is when — >■ 1. 

5. Experimental Results 

Here we demonstrate the MAP and MMSE estimators for unitary dictionaries and provide both synthetic 
and real-signal experiments to illustrate these algorithms. 

5.1. Synthetic Experiments 



In the first experiment we use a 2D Wavelet dictionary D (Daubachies-5 filters) [lOj, with 3 levels of 
resolution. We choose all the atom probabilities Pi and all the variances (Ti to be the same in this test. We 
use P = 0.1 and ax — 1- 
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Generating a two-dimensional signal according to the proposed model is done by first randomly choosing 
whether each atom is part of the support or not with probability P. For the selected atoms, coefficients 
Xi are drawn independently from a normal distribution A/'(0,ct^). The resulting sparse vector of coefficients 
is multiplied by the unitary dictionary to obtain the ground-truth two-dimensional signal. Each entry is 
independently contaminated by white Gaussian noise Af{0,a^) to create the input signal y. The values of 
the additive noise power, a, are varied in the range [0.1, 1] to demonstrate the effect of the noise level on the 
overall performance. Each of the (noisy) signals is then approximated using the following estimators: 

1. Empirical Oracle estimation and its MSE. This estimator appears in Equation (|27p . 

2. Theoretical Oracle estimation error, as given in Equation (|42l) . 

3. Empirical MMSE estimation and its MSE. We use Equation PO)) in order to compute the estimation, 
and then assess its error empirically. 

4. Theoretical MMSE estimation error, using Equation p5| directly. 

5. Empirical MAP estimation and its MSE. We use the closed-form solution given in Equation ([55)1 . 

6. Theoretical MAP estimation error, as given in Equation (|49p . 

The above process is repeated for 1000 randomly generated signals of size 128 x 128, and the mean L2 error 
is averaged over all signals to obtain an estimate of the expected quality of each estimator. Figure |6] shows 
the relative denoising effect (compared to the original noisy signal) achieved by each estimator. The improved 
performance of the MMSE estimator over the MAP is clearly seen, as well as a clear validation of the theoretical 
derivations. 



• 1 . Oracle Emp. 

• 2. MMSE Emp. 




8.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

a 

Figure 6: Empirical and theoretical evaluations of the MSE as a function of the input noise for synthetic 
signals (P = 0.1, a,, = 1, and n = 128 x 128). 
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5.2. Real- World Signals 

Next, we experiment with real-world signals - images. The unitary dictionary for this experiment is the 
same 2D Wavelet Transform dictionary used in the synthetic experiment. This dictionary is known to serve 
natural image content adequately (i.e., sparsify image content). There are two main obstacles when aiming to 
operate on non-synthetic signals: 

1. The assumption that all the non-zero entries in x share the same variance is inadequate, and we should 
generalize the above discussion to a heteroscedastic model. 

2. The parameters that describe the signal model are unknown and need to be estimated from the corrupted 
signal. 

Our handing of these two issues is described in detail in Appendix A. 

It is important to note that our main goal in this experiment is to demonstrate the power of the MMSE and 
the MAP estimators, and their comparison. We do not attempt to compare these results to state-of-the-art 
image denoising algorithms, as the current model is too limited for this comparison to be fair, due to the 
non-adaptiveness and the unitarity of the dictionary. 

We experiment with the image Peppers shown in Figure lTOl The noise levels considered are: 5, 10, 15,... ,70, 
where the pixel values are in the range [0, 255]. The relative MSE of the cleaned image compared to the noisy 
one appears in Figure [71 as a function of the input noise power. Per each cr, the parameters are estimated, 
and then used within the MAP and the MMSE estimators. 

Clearly, the MMSE outperforms the MAP for all the noise levels, the gap being bigger for high SNR levels. 
Nevertheless, it is also evident from this graph that the difference between the two is relatively small. FigurelH 
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(a) Pi 



(b) a. 



Figure 8: Estimated parameters for the "Peppers" image with noise a — 10. 

shows the estimated parameters learnt from the noisy frame for each band, and the values of these parameter 
may provide an explanation for this phenomenon. 

As we have observed in the previous section, the gap between the MMSE and the MAP is expected to be 
negligible if Pk are nearly zeros or ones. This means that among the 10 bands in the wavelet transform, the 
three high-resolution and the single low-resolution bands are expected to give the same performance for both 
estimators. This suggests that the difference between the MAP and the MMSE is only due to the image energy 
that resides in the 6 middle-bands. Figure |9] shows the actual errors per band, as obtained by the MMSE and 
the MAP, and indeed, as expected, the difference in these errors exists mostly in the 6 middle bands. 

Finally, a visual comparison of the results of the different estimators is presented in Figure [TOl for the image 
Peppers, to which white Gaussian noise with cr = 10 is added. As expected, the MMSE result shows a small 
visual improvement over the MAP. 

6. Summary and Conclusions 

In this work we have studied a model where each atom has a given probability to be part of the support. 
This model assumes that all the supports are possible, thus avoiding assumptions on the (generally unknown) 
support size. We study MAP and MMSE estimators for the model with a general dictionary, including an 
overview of their performance. Then, we focus on unitary dictionaries, for which both estimators have simple 
and accurate closed formulas for their computation. After developing the closed-form MAP and MMSE 
estimators, it is shown how can they be interpreted in terms of shrinkage. We describe the relation of the 
MAP and MMSE estimators in this model to existing models appearing in the literature. This development is 
extended by looking at the theoretical performance of the estimators. Here, analytical bounds on the worst- 
case denoising performance is shown. Finally, synthetic and real-world experiments show the performance of 
the estimators, and the clear advantage of MMSE estimator over MAP estimator. 
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(a) MMSE (b) MAP 

Figure 9: The error per band for the MMSE and the MAP estimators. 




(a) Ground truth image 



(c) MAP (PSNR 32.33dB) 




(b) Noisy image (PSNR 28.12dB) 




(d) MMSE (PSNR Sa.OldB) 



Figure 10: Visual comparison of the reconstructed image by the MAP and MMSE estimators (a = 10). 
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Appendix A Handling Images 



As mentioned in Section [51 in order to handle a given noisy image, we should extend the model to allow for 
distinct variances for the different atoms, and we should also estimate the model parameters from the image. 
This appendix describes these two tasks. 

A.l. Extension to Heteroscedastic Model 

In the derivations in this paper we have assumed that all the non-zero entries in x have the same variance. 
As this is rarely the case for natural images, we treat now a more general problem, where this variance is 
atom-dependent. Such a model is known as heteroscedastic. Our goal is to show that most of the results 
remain of similar form, with modest changes. Thus, we shall keep the discussion in the section brief, and only 
state the main results. 

We change the covariance matrix in Equation to be a more general diagonal matrix V5, given by 



where k — \S\. For the general estimators developed in section [521 the changes due to this generalization are 
all absorbed in the matrices Q5 and C5, becoming 




(A-1) 



Q5 



D5V5D^ + a2l„, 
+ ;^D^D5, 



and the relation between them in Equation Q is still valid. 

Moving to the unitary case, the matrix Q5 is a diagonal matrix of the form 




(A-2) 



Its inversion, Q^^, can easily be calculated, and the oracle solution becomes 




(A-3) 



where 



(t|./(it|. -I- a^). The support of the MAP estimator is given by 



(A-4) 



Lastly, the unitary MMSE estimate presented in Equation (j40|) becomes 



n 




(A-5) 
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A. 2. Parameter Estimation 

The parameters of the image generation model are not known in advance and thus they should be estimated. 
We shall assume that each band in the wavelet transform is characterized by a pair of parameters (Ji,Pi, and 
there are r such bands overall (10 in the experiment reported in Section [5]). We propose to estimate these 
parameters directly from the noisy image, by performing the following optimization task: 

arg max P (y |{P., a^U ) ■ (A-6) 

{fi,<Ti}r=i 

Marginalization of this likelihood term with respect to the support of the image in the wavelet domain reads 

P (y \{P.. ^JLi ) = Y.P{y\S, {P„ a,}U )-P{S |{P., aJLi ) • (A-7) 

As maximization of this summation may be computationally difficult, we turn to approximate it by considering 
only one item - the dominant one within this sum. Thus, we propose to solve 

arg max P (y |{P., ^ jr=i ) (A-8) 

{f'i.o-ilLi 

«arg max P (y |5, {P„ aj^ ) • P (5 |{P„ aJU ) • P (5) , 

where we maximize with respect to the support as well. Note that we have introduced a prior on the support 
size, P [S) . We shall use the form 

r 

with Si the support in the i-th band. This prior controls the support sparsity in each band, and as we show 
next, it stabilizes the estimation procedure. The values are set to be high for low-frequency bands, and 
decrease for the higher frequency bands. 

We use the model definitions in Section 12.11 in order to develop an expression that depends only on the 
parameters of the r bands. Starting with P {S |{Pi, <JiYi=i ): g^t 

r 

P {S |{P., aJLi ) = n (1 - P.)"""'^'' , (A-9) 
1=1 

where rii is the size of the «-th band. Using the fact that the wavelet dictionary is unitary and exploiting 
Equation (jA-2[) . we have 



det(C,)=p^ a- = a-n^^--'"nP^ ■ (^"10) 
Plugging Equation ^ and the above expressions into ()A-8p . the parameters estimation task becomes 



Two important features of this expression deserve our attention: First, rather than seeking the support S, 
this expression reveals that all we need are the cardinalities \S\ within each band. Second, this expression is 
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separable with respect to the r bands, implying that we can estimate Pi,ai for the i-th band by solving 

Taking the log of the above expression, we obtain an alternative function to maximize, 

-^logl'^^i^] +|5.|logP. 



+ (n, - log(l - P.) + T^^^ II/35JI' - A,|5.|. (A-11) 

zcr-^ cr^ + (T,r 



To obtain the estimates for ai and Pi we differentiate / with respect to these unknowns. The derivative with 
respect to Pi leads to 

df{\S^lP^ _ \S,\ (n,-|5,|) 

Similarly, the derivative with respect to (Ji gives 

df{\S,\,P„<J,) cr, aj 2 , 2 11^5,11^ 2 

< + (0-2 + 0-2) I'^il 

The last step in this estimation process is to discover the cardinality \Si\. Returning to the expression to 
be maximized in Equation (|A-lip . we can plug in the solutions obtained for Pi and ai, both being functions 
of \Si\. The overall expression is thus a function of the scalar \Si\, and the maximizer value can be found by 
a simple sweep of this unknown in the range [0,71^]. We should note that for every value tested, we should 
also update the vector l3s to include only non-zero elements of Si. Since we are maximizing / {\Si\, Pi,ai), 
we should choose the largest entries (in absolute value) within this vector. After this exhaustive process is 
done, we pick the support size and the respective calculated parameters that maximize the optimization task 
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